Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use bwa-aln-interactive and upgrade developer's docs #32

Merged
merged 17 commits into from
Oct 16, 2024

Conversation

clintval
Copy link
Member

@clintval clintval commented Sep 20, 2024

Closes #5
Closes #30
Closes #27

This PR also:

  • Addresses improper installation of poetry.
  • Adds some polish to the main README and developer documentation.

@clintval
Copy link
Member Author

clintval commented Sep 20, 2024

Local and remote tests hang forever when you install bwa-aln-interactive from bioconda and run tests. The executable from bioconda is either broken or this library isn't compatible with it yet.

Screenshot 2024-09-19 at 7 33 29 PM

My full stack trace is:

Stack Trace
<doctest prymer.offtarget.bwa[3]>:1:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <prymer.offtarget.bwa.BwaAlnInteractive object at 0x13e8920d0>
ref = PosixPath('tests/offtarget/data/miniref.fa'), max_hits = 1, executable = 'bwa-aln-interactive'
max_mismatches = 3, max_mismatches_in_seed = 3, max_gap_opens = 0, max_gap_extensions = -1, seed_length = 20
reverse_complement = False, include_alt_hits = False, threads = 16

    def __init__(
        self,
        ref: Path,
        max_hits: int,
        executable: str | Path = "bwa-aln-interactive",
        max_mismatches: int = 3,
        max_mismatches_in_seed: int = 3,
        max_gap_opens: int = 0,
        max_gap_extensions: int = -1,
        seed_length: int = 20,
        reverse_complement: bool = False,
        include_alt_hits: bool = False,
        threads: Optional[int] = None,
    ):
        """
        Args:
            ref: the path to the reference FASTA, which must be indexed with bwa.
            max_hits: the maximum number of hits to report - if more than this number of seed hits
                      are found, report only the count and not each hit.
            executable: string or Path representation of the `bwa-aln-interactive` executable path
            max_mismatches: the maximum number of mismatches allowed in the full query sequence
            max_mismatches_in_seed: the maximum number of mismatches allowed in the seed region
            max_gap_opens: the maximum number of gap opens allowed in the full query sequence
            max_gap_extensions: the maximum number of gap extensions allowed in the full query
                                sequence
            seed_length: the length of the seed region
            reverse_complement: reverse complement each query sequence before alignment
            include_alt_hits: if true include hits to references with names ending in _alt,
                              otherwise do not include them.
            threads: the number of threads to use.  If `None`, use all available CPUs.
        """
        threads = os.cpu_count() if threads is None else threads
        executable_path = ExecutableRunner.validate_executable_path(executable=executable)
        self.reverse_complement: bool = reverse_complement
        self.include_alt_hits: bool = include_alt_hits
        self.max_hits: int = max_hits

        missing_aux_paths = []
        for aux_ext in BWA_AUX_EXTENSIONS:
            aux_path = Path(f"{ref}{aux_ext}")
            if not aux_path.exists():
                missing_aux_paths.append(aux_path)
        if len(missing_aux_paths) > 0:
            message: str
            if len(missing_aux_paths) > 1:
                message = "BWA index files do not exist:\n\t"
            else:
                message = "BWA index file does not exist:\n\t"
            message += "\t\n".join(f"{p}" for p in missing_aux_paths)
            raise FileNotFoundError(f"{message}\nPlease index with: `{executable_path} index {ref}`")

        # -N = non-iterative mode: search for all n-difference hits (slooow)
        # -S = output SAM (run samse)
        # -Z = interactive mode (no input buffer and force processing with empty lines between recs)
        command: list[str] = [
            f"{executable_path}",
            "aln",
            "-t",
            f"{threads}",
            "-n",
            f"{max_mismatches}",
            "-o",
            f"{max_gap_opens}",
            "-e",
            f"{max_gap_extensions}",
            "-l",
            f"{seed_length}",
            "-k",
            f"{max_mismatches_in_seed}",
            "-X",
            f"{max_hits}",
            "-N",
            "-S",
            "-Z",
            "-D",
            f"{ref}",
            "/dev/stdin",
        ]

        super().__init__(command=command)

        # HACK ALERT
        # This is a hack. By trial and error, pysam.AlignmentFile() will block reading unless
        # there's at least a few records due to htslib wanting to read a few records for format
        # auto-detection. Lame.  So a hundred queries are sent to the aligner to align enable the
        # htslib auto-detection to complete, and for us to be able to read using pysam.
        num_warmup: int = 100
        for i in range(num_warmup):
            query = Query(id=f"ignoreme:{i}", bases="A" * 100)
            fastq_str = query.to_fastq(reverse_complement=self.reverse_complement)
            self._subprocess.stdin.write(fastq_str)
        self.__signal_bwa()  # forces the input to be sent to the underlying process.
>       self._reader = sam.reader(path=self._subprocess.stdout, file_type=sam.SamFileType.SAM)

prymer/offtarget/bwa.py:307:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

path = <_io.TextIOWrapper name=20 encoding='UTF-8'>, file_type = <SamFileType.SAM: ('', '.sam')>, unmapped = False

    def reader(
        path: SamPath, file_type: Optional[SamFileType] = None, unmapped: bool = False
    ) -> SamFile:
        """Opens a SAM/BAM/CRAM for reading.

        To read from standard input, provide any of `"-"`, `"stdin"`, or `"/dev/stdin"` as the input
        `path`.

        Args:
            path: a file handle or path to the SAM/BAM/CRAM to read or write.
            file_type: the file type to assume when opening the file.  If None, then the file
                type will be auto-detected.
            unmapped: True if the file is unmapped and has no sequence dictionary, False otherwise.
        """
>       return _pysam_open(path=path, open_for_reading=True, file_type=file_type, unmapped=unmapped)

../../../.miniforge-x86/envs/prymer/lib/python3.12/site-packages/fgpyo/sam/__init__.py:309:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

path = <_io.TextIOWrapper name=20 encoding='UTF-8'>, open_for_reading = True
file_type = <SamFileType.SAM: ('', '.sam')>, unmapped = False, kwargs = {'mode': 'r'}

    def _pysam_open(
        path: SamPath,
        open_for_reading: bool,
        file_type: Optional[SamFileType] = None,
        unmapped: bool = False,
        **kwargs: Any,
    ) -> SamFile:
        """Opens a SAM/BAM/CRAM for reading or writing.

        This function permits reading from standard input and writing to standard output. The specified
        path may be the UNIX conventional `"-"`, the more explicit `"stdin"` or `"stdout"`, or an
        absolute path to either of the standard streams `"/dev/stdin"` or `"/dev/stdout"`.

        When writing to standard output, the file type must be specified.

        Args:
            path: a file handle or path to the SAM/BAM/CRAM to read or write.
            open_for_reading: True to open for reading, false otherwise.
            file_type: the file type to assume when opening the file.  If None, then the file type
                will be auto-detected for reading and must be a path-like object for writing.
            unmapped: True if the file is unmapped and has no sequence dictionary, False otherwise.
            kwargs: any keyword arguments to be passed to
            `pysam.AlignmentFile`; may not include "mode".
        """

        if isinstance(path, (str, Path)):
            if str(path) in _STDIN_PATHS and open_for_reading:
                path = sys.stdin
            elif str(path) in _STDOUT_PATHS and not open_for_reading:
                assert file_type is not None, "Must specify file_type when writing to standard output"
                path = sys.stdout
            else:
                file_type = file_type or SamFileType.from_path(path)
                path = str(path)
        elif not isinstance(path, _IOClasses):  # type: ignore
            open_type = "reading" if open_for_reading else "writing"
            raise TypeError(f"Cannot open '{type(path)}' for {open_type}.")

        if file_type is None and not open_for_reading:
            raise ValueError("file_type must be given when writing to a file-like object")

        # file_type must be set when writing, so if file_type is None, then we must be opening it
        # for reading.  Hence, only set mode in kwargs to pysam when file_type is set and when
        # writing since we can let pysam auto-recognize the file type when reading.  See discussion:
        # https://github.com/pysam-developers/pysam/issues/655
        if file_type is not None:
            kwargs["mode"] = "r" if open_for_reading else "w" + file_type.mode
        else:
            assert open_for_reading, "Bug: file_type was None but open_for_reading was False"

        if unmapped and open_for_reading:
            kwargs["check_sq"] = False

        # Open it alignment file, suppressing stderr in case index files are older than SAM file
        with fgpyo.io.suppress_stderr():
>           alignment_file = pysam.AlignmentFile(path, **kwargs)
E           KeyboardInterrupt

../../../.miniforge-x86/envs/prymer/lib/python3.12/site-packages/fgpyo/sam/__init__.py:290: KeyboardInterrupt

@clintval clintval changed the title docs: use bwa-aln-interactive and upgrade developer's docs feat use bwa-aln-interactive and upgrade developer's docs Sep 20, 2024
@clintval clintval changed the title feat use bwa-aln-interactive and upgrade developer's docs feat: use bwa-aln-interactive and upgrade developer's docs Sep 20, 2024
@clintval
Copy link
Member Author

clintval commented Sep 20, 2024

EDIT: I solved the issue. See later comments.

I've narrowed the issue down to this "hack" which does not appear to work for me:

# HACK ALERT
# This is a hack. By trial and error, pysam.AlignmentFile() will block reading unless
# there's at least a few records due to htslib wanting to read a few records for format
# auto-detection. Lame. So a hundred queries are sent to the aligner to align enable the
# htslib auto-detection to complete, and for us to be able to read using pysam.
num_warmup: int = 100
for i in range(num_warmup):
query = Query(id=f"ignoreme:{i}", bases="A" * 100)
fastq_str = query.to_fastq(reverse_complement=self.reverse_complement)
self._subprocess.stdin.write(fastq_str)
self.__signal_bwa() # forces the input to be sent to the underlying process.
self._reader = sam.reader(path=self._subprocess.stdout, file_type=sam.SamFileType.SAM)
for _ in range(num_warmup):
next(self._reader)

Increasing the records for warmup doesn't resolve the issue.

This is difficult to debug because stderr is suppressed for everything (bwa, htslib, etc.).

Forcing flush/unbufferd IO with PYTHONUNBUFFERED=1 doesn't work either.


Although it initially appears pysam AlignmentFile is to blame (hence the hack), I have discovered the issue is not actually with pysam at all. If I remove the dependency on AlignmentFile and process lines into AlignmentHeader and AlignedSegment objects directly (using the from_text() and from_string() class methods) then I see that bwa-aln-interactive does not flush output immediately. It appears to buffer and hang at times. The number of threads you assign to the executable do matter!

Removing the "hack" is easy and I suggest we do it anyway.

But it exposes an issue with bwa-aln-interactive.

@clintval
Copy link
Member Author

clintval commented Sep 20, 2024

I figured it out. The GitHub release of bwa-aln-interactive is broken and we are shipping the non-interactive bwa. Next, I will fix up the public repository and the bioconda package by incrementing the build number for bwa-aln-interactive.

As a side quest, I was able to remove the AlignmentFile "hack"!

I wish prymer had better logging redirects for the underlying code it calls. Debugging this was painful without knowing where all the standard errors go (hint... it's /dev/null and the user can't change that).

@clintval clintval mentioned this pull request Sep 20, 2024
@msto
Copy link
Collaborator

msto commented Sep 23, 2024

I wish prymer had better logging redirects for the underlying code it calls. Debugging this was painful without knowing where all the standard errors go (hint... it's /dev/null and the user can't change that).

Can you open a follow-up issue for this? I agree

@clintval
Copy link
Member Author

I wish prymer had better logging redirects for the underlying code it calls. Debugging this was painful without knowing where all the standard errors go (hint... it's /dev/null and the user can't change that).

Can you open a follow-up issue for this? I agree

You have one open for Primer3:

I wrote one up for bwa:

Copy link

codecov bot commented Oct 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.77%. Comparing base (93ea6e7) to head (d997e13).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #32   +/-   ##
=======================================
  Coverage   96.76%   96.77%           
=======================================
  Files          26       26           
  Lines        1701     1703    +2     
  Branches      328      327    -1     
=======================================
+ Hits         1646     1648    +2     
  Misses         29       29           
  Partials       26       26           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@clintval clintval marked this pull request as ready for review October 4, 2024 18:19
Comment on lines 309 to 311
self._subprocess.stdin.flush()
self._subprocess.stdin.write("\n" * 16)
self._subprocess.stdin.flush()
Copy link
Member Author

@clintval clintval Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Bioconda built executable requires more newlines before emitting an alignment.

Why the executable needs more than 2 newlines is currently a mystery.

.github/workflows/tests.yml Outdated Show resolved Hide resolved
prymer/offtarget/bwa.py Show resolved Hide resolved
@msto msto assigned clintval and unassigned tfenne and emmcauley Oct 7, 2024
@clintval clintval assigned tfenne and emmcauley and unassigned clintval Oct 7, 2024
@clintval
Copy link
Member Author

clintval commented Oct 7, 2024

@msto's tested the developer docs and tests passed.

I think this is ready for review!

@clintval
Copy link
Member Author

clintval commented Oct 15, 2024

@coderabbitai full review

Copy link

coderabbitai bot commented Oct 15, 2024

✅ Actions performed

Full review triggered.

Copy link

coderabbitai bot commented Oct 15, 2024

Walkthrough

The pull request introduces several modifications across multiple files to enhance the project's configuration and documentation. Key changes include updates to the GitHub Actions workflow for testing, revisions to the .gitignore file, and improvements to the README.md and installation documentation. The prymer.yml file sees a shift in dependency management, while the bwa.py and offtarget_detector.py files update the executable name for BWA alignment tools. Overall, these changes aim to streamline installation processes and improve clarity in documentation.

Changes

File Change Summary
.github/workflows/tests.yml Removed checkout step, modified channel order, added Poetry installation, consolidated dependency steps, renamed testing step, and added documentation build step.
.gitignore Added entries for various temporary files, build artifacts, and environment-specific files.
README.md Updated badge links, enhanced installation instructions, and replaced the "Quick setup" section.
docs/index.md Updated link for the "Installation" section.
docs/installation-and-developers-documentation.md Created new documentation file detailing installation and development setup for prymer.
prymer.yml Updated dependencies, removed old ones, and added new dependencies from bioconda.
prymer/offtarget/bwa.py Changed executable name in BwaAlnInteractive class, updated error messages, and improved documentation.
prymer/offtarget/offtarget_detector.py Updated executable parameter in OffTargetDetector class and enhanced class documentation.
pyproject.toml Modified pytest options for coverage reporting and updated linting configurations.

Assessment against linked issues

Objective Addressed Explanation
Add prymer to bioconda (#5)
Update README to refer to bioconda package for bwa-aln-interactive (#30)
GitHub Action tests are using deprecated Mambaforge (#27)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (13)
docs/index.md (1)

7-7: LGTM! Consider updating the section title for clarity.

The update to the installation documentation link is appropriate and aligns with the PR objectives of enhancing developer documentation. This change will help users find the expanded information more easily.

Consider updating the section title to reflect the expanded content:

-* [Installation](installation-and-developers-documentation.md)
+* [Installation & Developer Guide](installation-and-developers-documentation.md)

This change would make it immediately clear to users that the section now includes both installation instructions and developer documentation.

.github/workflows/tests.yml (1)

50-52: Excellent updates to testing and documentation building!

The simplification of the test command to poetry run pytest is a good improvement, making the workflow more straightforward.

The addition of a documentation build test with poetry run mkdocs build --strict is a great practice. It ensures that the documentation can be built successfully and catches potential issues early. The use of the --strict flag is particularly commendable as it treats warnings as errors, promoting high-quality documentation.

Suggestion for improvement:
Consider adding the --cov flag to the pytest command to generate coverage reports. This would align with the subsequent step that uploads coverage reports to Codecov.

Example:

run: poetry run pytest --cov=./ --cov-report=xml

This change would ensure that coverage data is generated during the test run.

Also applies to: 59-61

.gitignore (2)

Line range hint 88-116: Consider uncommenting specific environment management patterns

The patterns for pyenv, pipenv, poetry, and pdm are commented out. Depending on your project's needs, you might want to uncomment some of these.

For example, if you're using Poetry, you might want to uncomment:

-#poetry.lock
+poetry.lock

This ensures consistent dependency versions across different development environments.


Line range hint 162-167: Consider uncommenting PyCharm-specific ignore pattern

The PyCharm-specific ignore pattern is commented out. If your team uses PyCharm, you might want to uncomment this line.

If PyCharm is commonly used in your development environment, consider uncommenting:

-#.idea/
+.idea/

This will ignore PyCharm project settings, which are typically user-specific.

README.md (2)

20-34: Great improvements to installation and download information!

The updated badges for Bioconda and PyPI installation, along with the addition of download statistics, provide valuable information to users and potential contributors. This enhances the project's credibility and makes it easier for users to choose their preferred installation method.

Consider adding a brief explanation in the README about the significance of Bioconda for bioinformatics projects, as not all users may be familiar with it.


48-62: Great improvements to the installation instructions!

The new "Recommended Installation" section provides clear and concise instructions, addressing the PR objectives. The explicit mention of prerequisites and the use of Bioconda for installation simplify the process for users.

Consider adding a brief note about alternative installation methods (e.g., PyPI) for users who may not prefer or have access to Bioconda. This could be a simple line like:
"For alternative installation methods, please refer to the [developer's instructions][developers-instructions-link]."

docs/installation-and-developers-documentation.md (4)

13-39: LGTM: Comprehensive development setup instructions.

The development setup instructions are clear, comprehensive, and well-organized. They cover all necessary steps from environment setup to package installation.

Consider adding a note about potential troubleshooting steps or common issues that developers might encounter during the setup process.


41-54: LGTM: Clear instructions for testing and linting.

The build checking section provides clear instructions for running tests, including mypy checks, ruff checks, and unit tests. The separation of formatting and linting commands is appropriate.

Consider adding a brief explanation of what each tool (mypy, ruff, pytest) does for developers who might be unfamiliar with them.


64-73: LGTM: Well-documented release process.

The release process is well-documented and follows best practices for version control and release management. The steps are clear and easy to follow.

Consider adding a note about the importance of updating the changelog before creating a release, as this is often overlooked but crucial for maintaining a good release history.


74-92: LGTM: Comprehensive release automation and versioning explanation.

The explanation of Semantic Versioning and the description of the automated release process are clear and comprehensive. The note about editing the changelog is important and well-placed.

Regarding the static analysis hints:

  1. The phrase "In brief" (line 76) is concise and appropriate in this context. No change needed.
  2. Consider hyphenating "backwards-compatible" in lines 79 and 80 for improved readability:
    -* `MINOR` version when you add functionality in a backwards compatible manner
    -* `PATCH` version when you make backwards compatible bug fixes
    +* `MINOR` version when you add functionality in a backwards-compatible manner
    +* `PATCH` version when you make backwards-compatible bug fixes
🧰 Tools
🪛 LanguageTool

[style] ~76-~76: ‘In brief’ might be wordy. Consider a shorter alternative.
Context: ...tic Versioning](https://semver.org/). > In brief: > > * MAJOR version when you make i...

(EN_WORDINESS_PREMIUM_IN_BRIEF)


[uncategorized] ~79-~79: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...version when you add functionality in a backwards compatible manner > * PATCH version when you mak...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~80-~80: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...anner > * PATCH version when you make backwards compatible bug fixes GitHub Actions will take car...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

prymer/offtarget/bwa.py (3)

18-23: Excellent addition of installation information.

The new documentation clearly explains the requirement for bwa-aln-interactive and provides helpful links for installation. This improves the usability of the module.

Consider adding a brief note about the purpose or advantages of using bwa-aln-interactive over standard bwa to provide context for why this specific version is required.


203-206: Good addition of reference links in the class docstring.

The added links to the GitHub repository and Bioconda package in the BwaAlnInteractive class docstring provide valuable context and resources for users.

Consider using a more concise format for the links, such as inline links, to improve readability. For example:

See [bwa-aln-interactive on GitHub](https://github.com/fulcrumgenomics/bwa-aln-interactive) and its [Bioconda package](https://bioconda.github.io/recipes/bwa-aln-interactive/README.html).

308-313: Improved documentation for BWA signaling, but further clarification needed.

The added comments provide helpful context about the need for multiple newlines when signaling BWA. However, the exact reason for using 16 newlines and the nature of the platform differences remain unclear.

To address the previous review comment and improve clarity:

  1. Consider adding a brief explanation of which platforms were tested and how they behaved differently.
  2. If possible, provide more specific information about why 16 newlines were chosen and how this number was determined.
  3. Consider adding a TODO comment to investigate and document the root cause of this platform-dependent behavior in the future.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between a132a52 and a6205c9.

📒 Files selected for processing (10)
  • .github/workflows/tests.yml (1 hunks)
  • .gitignore (1 hunks)
  • README.md (1 hunks)
  • docs/index.md (1 hunks)
  • docs/installation-and-developers-documentation.md (1 hunks)
  • docs/installation.md (0 hunks)
  • prymer.yml (1 hunks)
  • prymer/offtarget/bwa.py (6 hunks)
  • prymer/offtarget/offtarget_detector.py (2 hunks)
  • pyproject.toml (1 hunks)
💤 Files with no reviewable changes (1)
  • docs/installation.md
🧰 Additional context used
🪛 LanguageTool
README.md

[style] ~37-~37: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 706 characters long)
Context: ...python-tests-badge]][python-tests-link] [![publish prymer][publish-prymer-badge]][...

(EN_EXCESSIVE_EXCLAMATION)

docs/installation-and-developers-documentation.md

[style] ~76-~76: ‘In brief’ might be wordy. Consider a shorter alternative.
Context: ...tic Versioning](https://semver.org/). > In brief: > > * MAJOR version when you make i...

(EN_WORDINESS_PREMIUM_IN_BRIEF)


[uncategorized] ~79-~79: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...version when you add functionality in a backwards compatible manner > * PATCH version when you mak...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~80-~80: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...anner > * PATCH version when you make backwards compatible bug fixes GitHub Actions will take car...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🔇 Additional comments (31)
prymer.yml (2)

6-8: Approve new dependencies and channel specifications.

The new dependencies align well with the project objectives:

  1. The addition of bwa-aln-interactive from Bioconda addresses the integration requirement mentioned in the PR objectives.
  2. The explicit channel specifications for each package (bioconda, conda-forge) is a good practice for reproducibility.
  3. The Python dependency is now explicitly from conda-forge, which is appropriate.

Let's verify if the specified version of bwa-aln-interactive is the latest stable version:

#!/bin/bash
# Description: Check if bwa-aln-interactive 0.7.18 is the latest version on Bioconda

# Test: Search for the latest version of bwa-aln-interactive on Bioconda
conda search -c bioconda bwa-aln-interactive | tail -n 1

If a newer version is available, consider updating to the latest stable release.


6-8: Verify the removal of development dependencies.

Several development and testing tools have been removed from the conda environment file:

  • ruff (linter)
  • mypy (static type checker)
  • pytest and pytest-workflow (testing frameworks)
  • poetry and pyproject_hooks (dependency management)

While this simplifies the conda environment, it raises some concerns:

  1. How will linting, type checking, and testing be performed now?
  2. Is there an alternative method for managing development dependencies?

To ensure these tools are still available for development, let's check if they're specified elsewhere:

If these development tools are not specified elsewhere, consider:

  1. Adding them to a separate development-specific conda environment file.
  2. Specifying them in a pyproject.toml file if using Poetry for dependency management.
  3. Documenting the development setup process if these tools are intended to be installed globally or through other means.
✅ Verification successful

Development dependencies are correctly managed in pyproject.toml.

The removed development and testing tools are specified under [tool.poetry.group.dev.dependencies] in pyproject.toml, ensuring they remain available for development purposes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for development dependencies in pyproject.toml or other config files

# Test 1: Check if pyproject.toml exists and contains dev dependencies
if [ -f pyproject.toml ]; then
    echo "Contents of pyproject.toml:"
    cat pyproject.toml
fi

# Test 2: Check for any other potential config files that might specify dev dependencies
echo "Other potential config files:"
find . -maxdepth 1 -type f \( -name "*.toml" -o -name "*.ini" -o -name "*.cfg" \) -not -name "pyproject.toml"

Length of output: 226


Script:

#!/bin/bash
# Description: Check for development dependencies in common config files

# Test 1: Check if pyproject.toml exists and contains dev dependencies
if [ -f pyproject.toml ]; then
    echo "Contents of pyproject.toml:"
    cat pyproject.toml
else
    echo "pyproject.toml not found."
fi

# Test 2: Check for requirements.txt and list dev dependencies if present
if [ -f requirements.txt ]; then
    echo "Contents of requirements.txt:"
    cat requirements.txt
else
    echo "requirements.txt not found."
fi

# Test 3: Check for setup.py and list development dependencies
if [ -f setup.py ]; then
    echo "Contents of setup.py:"
    cat setup.py
else
    echo "setup.py not found."
fi

# Test 4: Search for other potential config files that might specify dev dependencies
echo "Other potential config files containing 'dev' or 'development':"
grep -rilE 'dev(dependencies)?' .

Length of output: 5349

.github/workflows/tests.yml (4)

28-32: Excellent addition of Poetry installation step!

This new step explicitly manages the Poetry installation, ensuring consistency across environments by using a specific version. The use of parallel installation is a nice touch for potential performance improvements.

This change effectively addresses the previous suggestion about managing Poetry in its own environment, albeit using a different approach than the suggested pipx.


46-48: Great simplification of dependency installation!

The change to use poetry install for dependency installation is a positive improvement. This approach is the standard and recommended way to install dependencies for Poetry-based projects, ensuring that all dependencies specified in the pyproject.toml file are correctly installed.

This simplification makes the workflow more straightforward and easier to maintain.


Line range hint 1-61: Overall excellent improvements to the workflow!

The changes made to this workflow file significantly improve its efficiency and align it with best practices:

  1. The addition of a dedicated Poetry installation step ensures consistent dependency management.
  2. Simplification of dependency installation and testing commands improves readability and maintainability.
  3. The inclusion of a documentation build test enhances the overall quality assurance process.

These modifications collectively contribute to a more robust and reliable CI/CD pipeline. The workflow now better handles dependency management, testing, and documentation building, which should lead to improved development and deployment processes.

Great job on these updates!


38-38: Verify the intention behind channel reordering.

The channels have been reordered from "conda-forge,bioconda" to "bioconda,conda-forge". This change may affect package resolution priorities, potentially prioritizing bioinformatics-specific packages from Bioconda over more general packages from conda-forge.

Please confirm if this reordering is intentional and consider any potential impacts on package versions or conflicts that might arise from this change.

✅ Verification successful

Channel reordering verified as intentional.
The channels have been reordered to prioritize "bioconda" over "conda-forge" in both the configuration file and documentation. This ensures that bioinformatics-specific packages from Bioconda are given priority, aligning with the project's dependency requirements.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any conda-related issues in the repository

# Test: Search for conda environment files and their channel specifications
echo "Searching for conda environment files and their channel specifications:"
rg --type yaml 'channels:' -A 5

# Test: Look for any explicit mentions of conda channels in Python files
echo "Checking for explicit mentions of conda channels in Python files:"
rg --type python 'conda[-_]forge|bioconda'

# Test: Search for any conda-related issues or discussions in documentation
echo "Searching for conda-related discussions in documentation:"
rg --type md 'conda[-_]forge|bioconda'

Length of output: 1679

.gitignore (11)

1-2: LGTM: General system and IDE files

The inclusion of .DS_Store and .vscode/ is appropriate for ignoring system-specific and IDE files.


Line range hint 4-13: LGTM: Python-specific patterns

This section correctly ignores common Python bytecode and compiled files.


Line range hint 15-31: LGTM: Distribution and packaging

Comprehensive coverage of distribution and packaging artifacts. This section will help keep the repository clean of build-related files.


Line range hint 33-37: LGTM: PyInstaller

Appropriate patterns for ignoring PyInstaller build artifacts.


Line range hint 39-41: LGTM: Installer logs

Correctly ignores pip-related log files.


Line range hint 43-57: LGTM: Testing and coverage

Comprehensive coverage of test-related files and directories for various testing frameworks.


Line range hint 59-61: LGTM: Translations

Appropriate patterns for ignoring translation files.


Line range hint 63-70: LGTM: Web frameworks (Django, Flask)

Correctly ignores framework-specific files and directories.


Line range hint 72-86: LGTM: Other tools (Scrapy, Sphinx, PyBuilder, Jupyter, IPython)

Good coverage of various Python-related tools and their artifacts.


Line range hint 118-160: LGTM: Other Python-related tools and frameworks

Comprehensive coverage of various Python tools, including Celery, SageMath, virtual environments, and type checkers.


Line range hint 1-167: Overall assessment: Comprehensive and well-structured .gitignore file

This .gitignore file is comprehensive and well-structured, covering a wide range of Python development scenarios. It will effectively keep the repository clean of unnecessary files and build artifacts.

A few minor suggestions:

  1. Consider uncommenting specific environment management patterns (e.g., poetry.lock) if they apply to your project.
  2. If PyCharm is used in your development environment, consider uncommenting the .idea/ pattern.

These minor adjustments can further tailor the .gitignore file to your project's specific needs.

README.md (3)

5-18: Excellent updates to the project badges!

The addition of MyPy, Poetry, and Ruff badges significantly improves the README by showcasing the project's commitment to modern development practices. The reorganization enhances clarity and provides valuable information to potential contributors and users.


37-46: Excellent addition of CI/CD and code quality badges!

The new badges for tests, publishing, and code coverage provide crucial information about the project's quality and development processes. This transparency is valuable for both users and potential contributors, instilling confidence in the project's reliability and maintenance.

🧰 Tools
🪛 LanguageTool

[style] ~37-~37: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 706 characters long)
Context: ...python-tests-badge]][python-tests-link] [![publish prymer][publish-prymer-badge]][...

(EN_EXCESSIVE_EXCLAMATION)


1-62: Excellent updates to the README.md file!

The changes made to this file significantly enhance its quality and usefulness:

  1. The updated badges provide a comprehensive overview of the project's quality, development practices, and installation options.
  2. The new "Recommended Installation" section addresses Issue Update README to refer to bioconda package for bwa-aln-interactive #30 by clearly referencing the Bioconda package and providing specific installation instructions.
  3. The promotion of Bioconda for installation indirectly supports Issue Add prymer to bioconda #5 by encouraging users to use this bioinformatics-focused package manager.
  4. The overall structure and content of the README now provide a more informative and user-friendly introduction to the project.

These improvements align well with the PR objectives and will greatly benefit both new and existing users of the project.

🧰 Tools
🪛 LanguageTool

[style] ~37-~37: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 706 characters long)
Context: ...python-tests-badge]][python-tests-link] [![publish prymer][publish-prymer-badge]][...

(EN_EXCESSIVE_EXCLAMATION)

docs/installation-and-developers-documentation.md (3)

1-11: LGTM: Clear and concise installation instructions.

The recommended installation section provides clear and concise instructions for installing prymer using bioconda. It correctly mentions the dependencies and provides the exact command for installation.


56-62: LGTM: Concise documentation building instructions.

The documentation building section provides clear and concise instructions for building and serving the documentation using mkdocs. The combined command for building and serving is efficient.


1-92: Overall: Excellent documentation for installation and development.

This new documentation file provides comprehensive and well-structured guidance for installing, developing, and releasing the prymer package. It covers all necessary aspects, including environment setup, testing, documentation building, and the release process. The instructions are clear, concise, and follow best practices.

A few minor suggestions have been made to further enhance the documentation, but overall, this is an excellent addition to the project that will greatly benefit both users and developers.

🧰 Tools
🪛 LanguageTool

[style] ~76-~76: ‘In brief’ might be wordy. Consider a shorter alternative.
Context: ...tic Versioning](https://semver.org/). > In brief: > > * MAJOR version when you make i...

(EN_WORDINESS_PREMIUM_IN_BRIEF)


[uncategorized] ~79-~79: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...version when you add functionality in a backwards compatible manner > * PATCH version when you mak...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~80-~80: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...anner > * PATCH version when you make backwards compatible bug fixes GitHub Actions will take car...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

pyproject.toml (1)

Line range hint 149-152: Approve Ruff linting and isort configuration changes

The changes to the Ruff configuration are beneficial:

  1. Adding "B" to the unfixable list prevents automatic fixes for bugbear checks, which often require human review.
  2. Setting force-single-line = true for isort will format all imports on separate lines, improving readability.

These changes will enhance code quality and consistency.

prymer/offtarget/bwa.py (4)

195-195: Good addition of docstring for BWA_AUX_EXTENSIONS.

The added docstring clearly explains the purpose of the BWA_AUX_EXTENSIONS constant, improving code readability and maintainability.


221-221: Correct update of default executable name.

Changing the default value of the executable parameter from "bwa" to "bwa-aln-interactive" ensures consistency with the module's requirements and previous documentation updates.


236-236: Appropriate update to executable parameter docstring.

The docstring for the executable parameter has been correctly updated to reflect the use of bwa-aln-interactive, maintaining consistency with previous changes.


266-266: Correct update to error message for missing BWA index files.

The error message has been appropriately updated to use the correct executable name when suggesting how to index the reference. The use of f"{executable_path}" ensures that the message remains accurate even if a custom executable path is provided.

prymer/offtarget/offtarget_detector.py (3)

Line range hint 129-167: Excellent improvement to the class docstring!

The expanded docstring for the OffTargetDetector class is a significant enhancement. It now provides:

  1. Clear information about the custom "bwa-aln-interactive" tool being used.
  2. An overview of the class's functionality and main methods.
  3. Important notes on thread safety.
  4. A comprehensive explanation of off-target detection for both individual primers and primer pairs.

These improvements greatly enhance the usability and understanding of the class for potential users.


Line range hint 169-228: Great improvement to the init method docstring!

The expanded docstring for the __init__ method is a significant enhancement:

  1. It now groups parameters by their role in the off-target detection process, making it easier to understand their purpose.
  2. Each parameter is explained in detail, providing crucial information for users of the class.
  3. The docstring clarifies the default value and use case for min_primer_pair_hits, which is particularly helpful.

These improvements will greatly aid users in properly initializing and using the OffTargetDetector class.


Line range hint 229-516: Overall improvements focus on documentation and alignment with custom BWA version

The changes in this file are primarily focused on improving documentation and aligning the OffTargetDetector class with the use of the custom "bwa-aln-interactive" tool. The core functionality of the class remains unchanged, which is good for maintaining compatibility with existing usage.

Key improvements:

  1. Enhanced class and method docstrings provide better clarity on the class's purpose and usage.
  2. The executable parameter now defaults to "bwa-aln-interactive", aligning with the custom BWA version mentioned in the documentation.

These changes should improve the usability and understanding of the OffTargetDetector class without introducing breaking changes to its core functionality.

pyproject.toml Outdated Show resolved Hide resolved
prymer/offtarget/offtarget_detector.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
tests/offtarget/test_offtarget.py (1)

40-40: Update function signature to match usage

The addition of executable=BWA_EXECUTABLE_NAME to the OffTargetDetector constructor is good. However, the _build_detector function signature should be updated to include this new parameter for consistency and to prevent potential issues in the future.

Consider updating the _build_detector function signature as follows:

def _build_detector(
    ref_fasta: Path,
    max_primer_hits: int = 1,
    max_primer_pair_hits: int = 1,
    three_prime_region_length: int = 20,
    max_mismatches_in_three_prime_region: int = 0,
    max_mismatches: int = 0,
    max_amplicon_size: int = 250,
    cache_results: bool = True,
    executable: str = BWA_EXECUTABLE_NAME,
) -> OffTargetDetector:
prymer/offtarget/offtarget_detector.py (1)

Line range hint 130-168: LGTM: Improved docstring and flexible executable parameter

The updates to the class docstring and __init__ method are beneficial:

  1. The expanded docstring provides clearer information about the custom BWA version used, which is helpful for users and maintainers.
  2. Changing the executable parameter's default value to BWA_EXECUTABLE_NAME improves flexibility and maintainability.

These changes align well with the updated import statement.

Consider adding a brief explanation in the docstring about the BWA_EXECUTABLE_NAME constant and where it's defined. This could help developers quickly understand where to look if they need to modify the executable name.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between a6205c9 and afdc5cc.

📒 Files selected for processing (5)
  • prymer/offtarget/bwa.py (7 hunks)
  • prymer/offtarget/offtarget_detector.py (3 hunks)
  • pyproject.toml (1 hunks)
  • tests/api/test_picking.py (3 hunks)
  • tests/offtarget/test_offtarget.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • prymer/offtarget/bwa.py
  • pyproject.toml
🧰 Additional context used
🔇 Additional comments (6)
tests/offtarget/test_offtarget.py (2)

11-11: LGTM: Import of BWA_EXECUTABLE_NAME

The addition of this import is appropriate and follows good practices by importing only the specific name needed.


Line range hint 1-284: Verify impact on other test cases

The changes to import and use BWA_EXECUTABLE_NAME are appropriate and align with the PR objectives. However, it's important to ensure that these changes don't inadvertently affect other test cases in this file.

Please run the following command to check if any other test cases might be impacted:

This will help identify any other places where BWA configurations might need to be updated for consistency.

✅ Verification successful

Please run the following command to check for any other occurrences of BWA-related configurations in the test file:


BWA-related configurations are confined to lines 8-14 in tests/offtarget/test_offtarget.py.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any other occurrences of BWA-related configurations in the test file

rg --type python "bwa|BWA" tests/offtarget/test_offtarget.py

Length of output: 93


Script:

#!/bin/bash
# Description: Check for any other occurrences of BWA-related configurations in the test file

rg --type py "bwa|BWA" tests/offtarget/test_offtarget.py

Length of output: 334

prymer/offtarget/offtarget_detector.py (1)

93-93: LGTM: Importing BWA_EXECUTABLE_NAME constant

The import of BWA_EXECUTABLE_NAME from prymer.offtarget.bwa is a good practice. It centralizes the definition of the BWA executable name, which can make future updates easier and more consistent across the codebase.

tests/api/test_picking.py (3)

30-30: Consistent Import of BWA_EXECUTABLE_NAME

Importing BWA_EXECUTABLE_NAME from prymer.offtarget.bwa ensures that the test suite uses the correct executable consistently. This improves maintainability and reduces the risk of discrepancies between the tests and the actual implementation.


589-589: Explicitly Specifying executable Parameter in OffTargetDetector

Adding executable=BWA_EXECUTABLE_NAME to the OffTargetDetector instantiation makes the executable explicit, enhancing clarity and ensuring that the correct BWA executable is used during testing. This change improves test reliability by avoiding any default behaviors that might differ from the intended configuration.


897-897: Ensuring Correct Executable in test_and_pick_primer_pairs

By passing executable=BWA_EXECUTABLE_NAME to the OffTargetDetector within test_and_pick_primer_pairs, you ensure consistency across the test cases. This explicit specification enhances reproducibility and aligns the test setup with the application's expected behavior.

prymer/offtarget/bwa.py Outdated Show resolved Hide resolved
@tfenne tfenne assigned clintval and unassigned tfenne Oct 16, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
prymer/offtarget/bwa.py (1)

63-65: Good addition of BWA_EXECUTABLE_NAME constant

The introduction of this constant improves code maintainability by centralizing the executable name.

Consider expanding the docstring slightly to provide more context:

-"""The executable name for the interactive build of bwa aln."""
+"""The executable name for the interactive build of bwa aln, used for alignment operations."""
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between afdc5cc and d997e13.

📒 Files selected for processing (1)
  • prymer/offtarget/bwa.py (7 hunks)
🧰 Additional context used
🔇 Additional comments (6)
prymer/offtarget/bwa.py (6)

18-23: Excellent addition of installation instructions

The new documentation clearly explains the requirement for bwa-aln-interactive and provides useful links to the GitHub repository and Bioconda package. This addition enhances the usability of the module and helps users set up the necessary dependencies.


Line range hint 224-239: Improved constructor signature and documentation

The changes to the constructor are well-implemented:

  1. Using BWA_EXECUTABLE_NAME as the default value for the executable parameter ensures consistency.
  2. The updated documentation for the executable parameter is clear and informative.

These modifications enhance the usability and maintainability of the class.


269-269: Enhanced error message for missing BWA index files

The updated error message now includes the correct executable path in the suggested indexing command. This improvement makes it easier for users to resolve issues with missing BWA index files, enhancing the overall user experience.


198-198: Typo correction in comment

The typo in the comment for BWA_AUX_EXTENSIONS has been corrected from "extensiosn" to "extensions". This minor change improves the overall quality and readability of the code.


Line range hint 1-458: Overall assessment of changes

The modifications in this file significantly improve the integration of bwa-aln-interactive, enhance documentation, and address several issues mentioned in the linked GitHub issues and past review comments. Key improvements include:

  1. Clear installation instructions for bwa-aln-interactive.
  2. Introduction of a constant for the BWA executable name, improving maintainability.
  3. Updated constructor signature and documentation for better clarity.
  4. Enhanced error messages for missing BWA index files.
  5. Implementation of a workaround for platform-specific newline issues.

These changes align well with the PR objectives and contribute to a more robust and user-friendly codebase. The only area that might benefit from further investigation is the platform-specific newline behavior in the __signal_bwa method.


311-316: Workaround implemented for platform-specific newline issue

The changes address the issue mentioned in past comments about the executable requiring more newlines. The implementation of 16 newlines seems to work across different platforms.

While this workaround is functional, it would be beneficial to understand the root cause of this behavior. Consider the following:

  1. Investigate why different platforms require a different number of newlines.
  2. Explore if there's a more robust solution that doesn't rely on a fixed number of newlines.
  3. Document any findings in the code or in a separate document for future reference.

To help with the investigation, you could run the following script to check if there are any platform-specific configurations or if this behavior is documented:

@clintval clintval merged commit d87e740 into main Oct 16, 2024
7 checks passed
@clintval clintval deleted the cv_docs_upgrade branch October 16, 2024 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants